An empirical method for identifying and translating technical terminology

نویسنده

  • Sayori Shimohata
چکیده

This paper describes a. me thod for retrieving pat terns of words a.nd expressions frequently used in a. specific dom a.in and building a. dictionary for ma.chine t rans la t iou(MT). The method uses an untagged text corpus in retrieving word sequences a.nd simplified pa.rt-of-speech ternplates in identifying their synta.ctic ca.tegories. The pa.per presents e×perimenta.l results for a.pplying the words and expressions to a pat ternbased ma.chine translat ion system. 1 I n t r o d u c t i o n Th.ere has been a. continuous interest in corpusbased approa.ches which retrieve words and expressions in connection with a specific domain (we call them technical terms herea.fter). They may correspond to syntactic phra.ses or components of syntactic relationships and ha.ve been found useful in various application area.s, including inibrmation e×tra.ction, text sumlna.riza.tion, and ma.chine tra.nsla.tion. Am.ong others, a. knowledge of technica] terminology is indispensa.ble for machine tra.nsla.tion beca.use usage and mea.ning of technica.1 terms a.re often quite different from their literal interpreta.tion. One a.pproa.ch for identifying technical terminology is a. rule-ba.sed a.pproa.eh which learns l.oca.1 syntactic pat terns from a training corpus. A variety of methods ha.ve been developed within this fra.mework, (Ra.msha.w, 1995) (Arga.mon et al., 1999) (Ca.rdie and Pierce, 1.999) a.nd achieved good results for the considered ta.sk. Surprisingly, though, little work ha.s been d.evoted to lea.rning local syntact ic pa.tterns besides noun phrases. Another drawback of this a.pproach is tha.t it requires substa.ntiM training corpora, in many cases with pa.rt-of-speech tags. An. al ternative approa.ch is a. statistical one which retrieves recurrent word sequences as co]loca.tiolls (Sma.dja., 1993)(Ha.runo et a.1., 1996)(Shimolla.ta et a.1., :1997). This a.pproach is robust and pra.ctical because it uses t)lain text corpora, wi thout a.ny inibrmation dependent on a la.ngua.ge. Unlike the former N)proa.ch, this a.pproach extra.cts va.rious types of local pa.tterns a.t the same time. Therefore, post-processing, such as par t of speech ta.gging and syntactic category identifica.tion, is necessary when we a.pply them to NLP applica.tions. This pa.per presents a. method for identifying technicM terms froni a. corpus and a.pl)lying them to a. ma.chine tra.nsla.tion system. The proposed method retrieves local pa.tterns by utilizing the n-gram statistics a.nd identifies their syntactic categories with. simple pa.rt-ofspeech teml)la.tes. We ma.ke 3. ma.chine trans]a.tion dictiona.ry from the retrieved pat terns and tra.nslate documents in the Sa.lne doma.in a.s the original corpus. In the next section, we briefly describe a pa.ttern-based machine translat ion. The following section explains how th.e proposed method works in detail. We th.en present experimenta.l results a.nd conclude with a discussion. 2 P a t t e r n b a s e d M T s y s t e m h pattern-ha.seal MT system uses a set of bilingua.1 pa.t terns(CFG rules) (Abeille et a.l., 1990) (Ta.keda., 1.996) (Shimohata. et a.l., 1.999). In the pa.rsing process, the engine performs a. CFGparsing for a.n input sentence and rewrites trees by a.pplying the source pa.tterns. 3'erminals and non-terminals are processed under the sa.me fra.lnework but lexicalized pa.tterns ha.re priority over symbolized pa.tterns 1 A plausible parse We define a symbolized pattern as a pattern without a. terminal and ~L lexicalizcd pattern as that with more than one terminal, we prepares 1000 symbolized patterns a.nd 130,000 lexicalizcd patterns as a system

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Equivalence in Technical Texts: The Case of Accounting Terms in English-Persian Dictionaries

Translating accounting documents, in general, and accounting terminology, in particular, is not a simple task, especially when the new terms keep created in pace with accounting developments. This study was carried out to find the most common and preferable ways to translate accounting terms from English into Persian. Also, an attempt was made to identify the frequently used patterns of word-fo...

متن کامل

Equivalence in Technical Texts: The Case of Accounting Terms in English-Persian Dictionaries

Translating accounting documents, in general, and accounting terminology, in particular, is not a simple task, especially when the new terms keep created in pace with accounting developments. This study was carried out to find the most common and preferable ways to translate accounting terms from English into Persian. Also, an attempt was made to identify the frequently used patterns of word-fo...

متن کامل

An Analysis of Effective Factors on the Technical Efficiency of Health Production in the OIC Countries

 The importance of community's health followed by the consideration of endogenous growth models has led to an increase in health expenditure of countries to speed up economic growth and development. This has made the efficiency of health production function to an essential issue especially in developing countries. Based upon this, the present study with employing the stochastic frontier analysi...

متن کامل

An integrated fuzzy multiple objective decision framework to optimal fulfillment of engineering characteristics in quality function development

Quality function development (QFD) is a planning tools used to fulfill customer expectation and QFD is a systematic process to translating customer requirement (WHATs) into technical description (HOWs). QFD aims to maximize customer satisfactions related to enterprise satisfaction. The inherent fuzziness of relationships in QFD modeling justifies the use of fuzzy regression for estimating the r...

متن کامل

An integrated fuzzy multiple objective decision framework to optimal fulfillment of engineering characteristics in quality function development

Quality function development (QFD) is a planning tools used to fulfill customer expectation and QFD is a systematic process to translating customer requirement (WHATs) into technical description (HOWs). QFD aims to maximize customer satisfactions related to enterprise satisfaction. The inherent fuzziness of relationships in QFD modeling justifies the use of fuzzy regression for estimating the r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000